Are we getting interactions wrong?
The role of link functions
in psychological research

Laura Sità, Margherita Calderan, Tommaso Feraco,
Filippo Gambarota, Enrico Toffalini

per chi vuole provare a simulare le cose in tempo reale

qr code che manda a questo link https://github.com/sitalaura/link-functions/tree/main/R

oppure scaricare il file a questo percorso sitalaura.github.io/link-functions/R/datasim.R

1 Example

Simulated dataset 1

independent variable: age in years (years)

dependent variable: (variabile)

aggiungi screenshot dataset

Linear model

using the classical linear predictor

fitL = glm(y~age, data=d)

Linear model

what we dont see it bc its a default parameter but its actually hidden in our code:

Code
fitL_explicit = glm(y~age, family=gaussian(link="identity"), data=d)

the model uses family gaussian and the identity link function

link function in GLMs transforms (re-map) the linear predictor X

to the appropriate range of the response variable Y

2 Example

Simulated dataset 2

independent variable: age in years (years)

dependent variable: mistakes in a TRUE/FALSE task (accuracy)

aggiungi screenshot dataset

Linear model

using the classical linear predictor

fitL = glm(accuracy~age, data=d)

Linear model

Code
fitL <- glm(accuracy ~ age, data = d)

effL <- data.frame(
  allEffects(
    fitL,
    xlevels = list(age = seq(min(d$age), max(d$age), .05))
  )[["age"]]
)

ggplot(d, aes(x = age, y = accuracy)) +
  coord_cartesian(ylim = c(0, 1)) +
  geom_point(size = 4, alpha = .5, color = "darkblue") +
  geom_ribbon(
    data = effL,
    aes(x = age, ymin = lower, ymax = upper),
    alpha = .3, fill = "darkred", color = NA,
    inherit.aes = FALSE
  ) +
  geom_line(
    data = effL,
    aes(x = age, y = fit),
    size = 2, color = "darkred",
    inherit.aes = FALSE
  ) +
  theme(text = element_text(size = ts, color = "black")) +
  scale_x_continuous(breaks = seq(floor(min(d$age)), ceiling(max(d$age)), .5)) +
  scale_y_continuous(breaks = seq(0, 1, .1)) +
  ylab("accuracy") + xlab("Age (years)")

questo modello ci aiuta a predire i dati?

Linear model

no perché a 11 anni i bambini hanno accuratezza sminchiata

❌ Inappropriate model

IN THE FIRST EXAMPLE an identity link was appropriate bc

  • y (boh) spans from -inf to +inf

here an identity link is NOT appropriate bc

  • y (accuracy) spans from 0 to 1

✅ More appropriate model

fitLogit = glm(accuracy ~ age, data=d, family=binomial(link="logit"), weights= rep(k, nrow(d)))

in this case, link="logit" makes sure that y spans from 0 and 1

✅ More appropriate model

Code
effLogit <- data.frame(
  allEffects(
    fitLogit,
    xlevels = list(age = seq(min(d$age), max(d$age), .05))
  )[["age"]]
)

p1 <- ggplot(d, aes(x = age, y = accuracy)) +
  coord_cartesian(ylim = c(0, 1)) +
  geom_point(size = 4, alpha = .5, color = "darkblue") +
  geom_ribbon(
    data = effLogit,
    aes(x = age, ymin = lower, ymax = upper),
    alpha = .25, fill = "blue", color = NA,
    inherit.aes = FALSE
  ) +
  geom_line(
    data = effLogit,
    aes(x = age, y = fit),
    linewidth = 2, color = "blue",
    inherit.aes = FALSE
  ) +
  theme(text = element_text(size = ts, color = "black")) +
  scale_x_continuous(breaks = seq(floor(min(d$age)), ceiling(max(d$age)), .5)) +
  scale_y_continuous(breaks = seq(0, 1, .1)) +
  labs(y = "accuracy", x = "Age (years)")

p1

3 Studying interactions

Simulated dataset 2

independent variable: age in years (years)

dependent variable: mistakes in a TRUE/FALSE task (accuracy)

adding a new main effect

groups: normal kids (group = 0) vs kids with dyslexia (group = 1)

cosa ho effettivamente simulato

#codice della simulazione 

non ho simulato un’interazione, quindi ENTRAMBI i modelli trovano un’interazione che non c’è.

il vero modello in grado di fittare i dati: link=“mafc.probit”

let’s try out the multiple alternative forced choice (50% - bc of the true/false) probit link

fitM = glm(accuracy ~ age*group, data=d, family=binomial(link=mafc.probit(.m=2)), weights= rep(k, nrow(d)))

il vero modello in grado di fittare i dati: link=“mafc.probit”

using an linear model family=binomial(link="mafc.probit") no interaction emerges !!!! as it should be

fitM = glm(accuracy ~ age*group, data=d, family=binomial(link=mafc.probit(.m=2)), weights= rep(k, nrow(d)))

summary(fitM)

Call:
glm(formula = accuracy ~ age * group, family = binomial(link = mafc.probit(.m = 2)), 
    data = d, weights = rep(k, nrow(d)))

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.99479    0.03555   56.10   <2e-16 ***
age          0.96551    0.02802   34.46   <2e-16 ***
group1      -0.97744    0.04107  -23.80   <2e-16 ***
age:group1   0.08005    0.03638    2.20   0.0278 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 8442.54  on 999  degrees of freedom
Residual deviance:  805.04  on 996  degrees of freedom
AIC: 2779.2

Number of Fisher Scoring iterations: 6

4 Why interactions

Conclusions

Building a model means that we want to find the processo generativo dei dati which, diversamente dal mondo delle simulazioni, we could never know for sure

to do that we must make important decisions

choosing the more appropriate family of distributions to make sure that the new values of the vd im predicting lie within the bounds

choosing the more appropriate link function: otherwise it’s very likely you end up finding non linear effects (ie interactions) that are not there!

We’re conducting a systematic review concerning how often the wrong link functions are used in psychological research + they lead to finding a significant interaction: so far, quite often

Materials & Contact

All materials are available on GitHub at sitalaura/link-functions

Questions and feedbacks laura.sita@studenti.unipd.it

Bibliography

Domingue, B. W., Kanopka, K., Trejo, S., Rhemtulla, M., & Tucker-Drob, E. M. (2024). Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties. Psychological methods, 29(6), 1164.

Hardwicke, T. E., Thibault, R. T., Clarke, B., Moodie, N., Crüwell, S., Schiavone, S. R., Handcock, S. A., Nghiem, K. A., Mody, F., Eerola, T., et al. (2024). Prevalence of transparent research practices in psychology: A cross-sectional study of empirical articles published in 2022. Advances in Methods and Practices in Psychological Science, 7 (4), 25152459241283477.

Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348.

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological bulletin, 105(1), 156.

Thank you

Special thanks to

Supplementary materials

POSTERIOR PREDICTIVE CHECK LOGITI